216 research outputs found

    Factoring out ordered sections to expose thread-level parallelism

    Get PDF
    With the rise of multi-core processors, researchers are taking a new look at extending the applicability auto-parallelization techniques. In this paper, we identify a dependence pattern on which autoparallelization currently fails. This dependence pattern occurs for ordered sections, i.e. code fragments in a loop that must be executed atomically and in original program order. We discuss why these ordered sections prohibit current auto-parallelizers from working and we present a technique to deal with them. We experimentally demonstrate the efficacy of the technique, yielding significant overall program speedups

    Can we apply accelerator-cores to control-intensive programs?

    Get PDF
    There is a trend towards using accelerators to increase performance and energy efficiency of general-purpose processors. So far, most accelerators have been build with HPC-applications in mind. A question that arises is how well can other applications benefit from these accelerators? In this paper, we discuss the acceleration of three benchmarks using the SPUs of a Cell-BE. We analyze the potential speedup given the inherent parallelism in the applications. While the potential speedup is significant in all benchmarks, the obtained speedup lags behind due to a mismatch between micro-architectural properties of the accelerators and the benchmark properties

    D4.2 Programming Language and Runtime System: Early Prototype (executive Summary)

    Get PDF
    This document presents the executive summary of the deliverable on Programming Language and Runtime System: Early Prototype, which aims at describing the core functionality of the VINEYARD programming model and runtime system for accelerated data centres. We describe our approach to creating an abstract representation of accelerated kernels, such that application programmers can use these kernels without needing to worry about accelerator-specific calling conventions, or about the specific versions available in the VINEYARD accelerator library. The second key contribution of this document is the description of our approach to virtualizing accelerators. We assume that accelerators are assigned to jobs only when they are really needed, and not at job allocation time. This raises issues that need to be addressed in the virtualization layer and also in the application’s runtime. We describe these issues and our approach to solving

    Reducing the burden of parallel loop schedulers for many-core processors

    Get PDF

    An experimental study on performance portability of OpenCL kernels

    Get PDF
    Accelerator processors allow energy-efficient computation at high performance, especially for computationintensive applications. There exists a plethora of different accelerator architectures, such as GPUs and the Cell Broadband Engine. Each accelerator has its own programming language, but the recently introduced OpenCL language unifies accelerator programming languages. Hereby, OpenCL achieves functional protability, allowing to reduce the development time of kernels. Functional portability however has limited value without performance portability: the possibility to re-use optimized kernels with good performance. This paper investigates the specificity of code optimizations to accelerator architecture and the severity of lack of performance portability

    Parallel Programming of General-Purpose Programs Using Task-Based Programming Models

    Get PDF
    The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchronization. It is not straightforward, however, to express all patterns of parallelism in these models. Pipelines are an important parallel construct, although difficult to express in Cilk and TBBs in a straightfor- ward way, not without a verbose restructuring of the code. In this paper we demonstrate that pipeline parallelism can be easily and concisely expressed in a Cilk-like language, which we extend with input, output and input/output dependency types on procedure arguments, enforced at runtime by the scheduler. We evaluate our implementation on real applications and show that our Cilk-like scheduler, extended to track and enforce these dependencies has performance comparable to Cilk++
    • …
    corecore